entity recognition
- Europe > Italy (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- (11 more...)
- Research Report > Experimental Study (0.93)
- Overview (0.68)
- Information Technology (0.46)
- Health & Medicine (0.46)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (2 more...)
- Law (1.00)
- Media > News (0.92)
- Government > Regional Government > North America Government > United States Government (0.69)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (1.00)
- Government (0.68)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Security & Privacy (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
- North America > United States (0.45)
- Europe > United Kingdom (0.28)
- North America > Canada (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Transportation > Passenger (1.00)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- (16 more...)
- Europe > United Kingdom > England (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (13 more...)
- Leisure & Entertainment (1.00)
- Information Technology > Services (0.68)
TweetNERD - End to End Entity Linking Benchmark for Tweets
Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K+ Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area. We describe evaluation setup with TweetNERD for three NERD tasks: Named Entity Recognition (NER), Entity Linking with True Spans (EL), and End to End Entity Linking (End2End); and provide performance of existing publicly available methods on specific TweetNERD splits.
Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition
Sarcinelli, João Lucas Luz Lima, Silva, Diego Furtado
Large Language Models (LLMs) excel in many Natural Language Processing (NLP) tasks through in-context learning but often under-perform in Named Entity Recognition (NER), especially for lower-resource languages like Portuguese. While open-weight LLMs enable local deployment, no single model dominates all tasks, motivating ensemble approaches. However, existing LLM ensembles focus on text generation or classification, leaving NER under-explored. In this context, this work proposes a novel three-step ensemble pipeline for zero-shot NER using similarly capable, locally run LLMs. Our method outperforms individual LLMs in four out of five Portuguese NER datasets by leveraging a heuristic to select optimal model combinations with minimal annotated data. Moreover, we show that ensembles obtained on different source datasets generally outperform individual LLMs in cross-dataset configurations, potentially eliminating the need for annotated data for the current task.
- South America > Brazil > Minas Gerais (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- South America > Brazil > São Paulo (0.04)
- (5 more...)
- Research Report (0.65)
- Workflow (0.47)
LOCUS: A System and Method for Low-Cost Customization for Universal Specialization
Sundararaman, Dhanasekar, Li, Keying, Xiong, Wayne, Garg, Aashna
We present LOCUS (LOw-cost Customization for Universal Specialization), a pipeline that consumes few-shot data to streamline the construction and training of NLP models through targeted retrieval, synthetic data generation, and parameter-efficient tuning. With only a small number of labeled examples, LOCUS discovers pertinent data in a broad repository, synthesizes additional training samples via in-context data generation, and fine-tunes models using either full or low-rank (LoRA) parameter adaptation. Our approach targets named entity recognition (NER) and text classification (TC) benchmarks, consistently outperforming strong baselines (including GPT-4o) while substantially lowering costs and model sizes. Our resultant memory-optimized models retain 99% of fully fine-tuned accuracy while using barely 5% of the memory footprint, also beating GPT-4o on several benchmarks with less than 1% of its parameters.
Enhancing Job Matching: Occupation, Skill and Qualification Linking with the ESCO and EQF taxonomies
Saroglou, Stylianos, Diamantaras, Konstantinos, Preta, Francesco, Delianidi, Marina, Benisis, Apostolos, Meyer, Christian Johannes
This study investigates the potential of language models to improve the classification of labor market information by linking job vacancy texts to two major European frameworks: the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy and the European Qualifications Framework (EQF). We examine and compare two prominent methodologies from the literature: Sentence Linking and Entity Linking. In support of ongoing research, we release an open-source tool, incorporating these two methodologies, designed to facilitate further work on labor classification and employment discourse. To move beyond surface-level skill extraction, we introduce two annotated datasets specifically aimed at evaluating how occupations and qualifications are represented within job vacancy texts. Additionally, we examine different ways to utilize generative large language models for this task. Our findings contribute to advancing the state of the art in job entity extraction and offer computational infrastructure for examining work, skills, and labor market narratives in a digitally mediated economy. Our code is made publicly available: https://github.com/tabiya-tech/tabiya-livelihoods-classifier
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Africa > Ethiopia (0.04)
- Banking & Finance (0.74)
- Education (0.68)